4

Python in Comparison with Other Languages

Many programmers come to Python with prior experience of other programming languages. It happens often that they are already familiar with programming idioms of those languages and try to replicate them in Python. As every programming language is unique, bringing such foreign idioms often leads to overly verbose or suboptimal code.

The classic example of a foreign idiom that is often used by inexperienced programmers is iteration over lists. Someone that is familiar with arrays in the C language could write Python code similar to the following example:

for index in range(len(some_list)):
    print(some_list[index])

An experienced Pythonic programmer would most probably write:

for item in some_list:
    print(item)

Programming languages are often classified by paradigms that can be understood as cohesive sets of features supporting certain "styles of programming." Python is a multiparadigm language and thanks to this, it shares many similarities with a vast amount of other programming languages. As a result, you can write and structure your Python code almost the same way you would do that in Java, C++, or any other mainstream programming language.

Unfortunately, often that won't be as effective as using well-recognized Python patterns. Knowing native idioms allows you to write more readable and efficient code.

This chapter is aimed at programmers experienced with other programming languages. We will review some of the important features of Python together with idiomatic ways of solving common problems. We will also see how these compare to other programming languages and what common pitfalls are lurking for seasoned programmers that are just starting their Python journey. Along the way, we will cover the following topics:

  • Class model and object-oriented programming
  • Dynamic polymorphism
  • Data classes
  • Functional programming
  • Enumerations

Let's begin by considering the technical requirements.

Technical requirements

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%204.

Class model and object-oriented programming

The most prevalent paradigm of Python is object-oriented programming (also known as OOP). It is centered around objects that encapsulate data (in the form of object attributes) and behavior (in the form of methods). OOP is probably one of the most diverse paradigms. It has many styles, flavors, and implementations that have been developed over many years of programming history. Python takes inspiration from many other languages, so in this section, we will take a look at the implementation of OOP in Python through the prism of different languages.

To facilitate code reuse, extensibility, and modularity, OOP languages usually provide a means for either class composition or inheritance. Python is no different and like many other object-oriented languages supports the subclassing of types.

Python may not have as many object-oriented features as other OOP languages, but it has a pretty flexible data and class model that allows you to implement most OOP patterns with extreme elegance. Also, everything in Python is an object, including functions and class definitions and basic values like integers, floats, Booleans, and strings.

If we would like to find another popular programming language that has similar object-oriented syntax features and a similar data model, one of the closest matches would probably be Kotlin, which is a language that runs (mostly) on Java Virtual Machine (JVM). The following are the similarities between Kotlin and Python:

  • A convenient way to call methods of super-classes: Kotlin provides the super keyword and Python provides the super() function to explicitly reference methods or attributes of super-classes.
  • An expression for object self-reference: Kotlin provides the this expression, which always references the current object of the class. In Python, the first argument of the method is always an instance reference. By convention, it is named self.
  • Support for creating data classes: Like Python, Kotlin provides data classes as "syntactic sugar" over classic class definitions to simplify the creation of class-based data structures that are not supposed to convey a lot of behavior.
  • The concept of properties: Kotlin allows you to define class property setters and getters as functions. Python provides the property() decorator with a similar purpose, together with the concept of descriptors, which allows you to fully customize the attribute access of an object.

What makes Python really stand out in terms of OOP implementation is the approach to inheritance. Python, unlike Kotlin and many other languages, freely permits multiple inheritance (although it often isn't a good idea). Other languages often do not allow this or provide some constraints. Another important Python differentiator is the lack of private/public keywords that would control access to internal object attributes outside of the class definition.

Let's take a closer look at a feature that Python shares with Kotlin and some other JVM-based programming languages, which is access to super-classes through the super() call.

Accessing super-classes

There are multiple ways of encapsulating object behavior in OOP languages but one of the most common ones is the usage of classes. Python's OOP implementation is based precisely on the concept of classes and subclassing.

Subclassing is a convenient way of reusing existing classes by enhancing or specializing their behavior. Subclasses often rely on the behavior of their base classes but extend them with additional methods or provide completely new implementations for existing methods by overriding their definitions.

But overriding methods without access to their original implementations within the subclass would not facilitate code reuse at all. That's why Python offers the super() function, which returns a proxy object to the method implementations in all base classes. To better understand the potential of the super() function, let's imagine we want to subclass a Python dictionary type to allow access to the stored keys through a case-insensitive key lookup. You could use this, for instance, to store HTTP protocol header values as the HTTP protocol specification states that header names are case-insensitive.

The following is a simple example of implementing such a structure in Python through subclassing:

from collections import UserDict
from typing import Any
class CaseInsensitiveDict(UserDict):
    def __setitem__(self, key: str, value: Any):
        return super().__setitem__(key.lower(), value) 
    def __getitem__(self, key: str) -> Any:
        return super().__getitem__(key.lower())
    def __delitem__(self, key: str) -> None:
        return super().__delitem__(key.lower())

Our implementation of CaseInsensitiveDict relies on collections.UserDict instead of the built-in dict type. Although inheriting from the dict type is possible, we would quickly run into inconsistencies as the built-in dict type doesn't always call __setitem__() to update its state. Most importantly, it won't be used on object initialization and on update() method calls. Similar problems can arise when subclassing the list type. That's why good practice dictates to use collections.UserDict classes for subclassing the dict type and collections.UserList for subclassing the list type.

The core of modified dictionary behavior happens in __getitem__(self, item: str) and __setitem__(self, key: str, value: Any). These are methods responsible respectively for accessing dictionary elements using dictionary[key] and setting dictionary values using the dictionary[key] = value syntax. The typing annotations help us to denote that keys should be strings but values can be any Python type.

__setitem__() is responsible for storing and modifying dictionary values. It would not make sense to subclass the base dictionary type and not leverage its internal key-value storage. That's why we use super().__setitem__() to invoke the original set-item implementation. But before we allow the value to be stored, we transform the key to lowercase using the str.lower() method. That way we ensure that all keys stored in the dictionary will always be lowercase.

The __getitem__() method is analogous to the __setitem__() implementation. We know that every key is transformed to lowercase before being stored in a dictionary. Thanks to this, when key lookup occurs, we can also transform it to lowercase as well. If the super implementation of the __getitem__() method does not return the result, we can be sure that there is no case-insensitive match in the dictionary.

Last but not least, the __delitem__() method deletes existing dictionary keys. It uses the same technique to transform a key to lowercase and invoke super-class implementation. Thanks to this, we will be able to remove dictionary keys using the del dictionary[key] statement.

The following transcript shows a case-insensitive key lookup of our class in action:

>>> headers = CaseInsensitiveDict({
...     "Content-Length": 30,
...     "Content-Type": "application/json",
... })
>>> headers["CONTENT-LENGTH"]
30
>>> headers["content-type"]
'application/json'

The above use case for the super() function should be simple to follow and understand, but things get a bit more complex when multiple inheritance is involved. Python allows you to use multiple inheritance by introducing the Method Resolution Order (MRO). We will take a closer look at it in the next section.

Multiple inheritance and Method Resolution Order

Python MRO is based on C3 linearization, the deterministic MRO algorithm originally created for the Dylan programming language. The C3 algorithm builds the linearization of a class, also called precedence, which is an ordered list of the ancestors. This list is used to seek an attribute in a class inheritance tree.

You can find more information about the Dylan programming language at http://opendylan.org and Wikipedia has a great article on C3 linearization that can be found at https://en.wikipedia.org/wiki/C3_linearization.

Python didn't have the C3 linearization algorithm as its MRO from the beginning. It was introduced in Python 2.3 together with a common base type for all objects (that is, the object type). Before the change to the C3 linearization method, if a class had two ancestors (refer to Figure 4.1), the order in which methods were resolved was only easy to compute and track for simple cases that didn't use a multiple inheritance model in a cascading way.

The following is an example of a simple multiple inheritance pattern that would not require any special MRO:

class Base1: 
    pass 
     
 
class Base2:
    def method(self):
        print("Base2.method() called")
class MyClass(Base1, Base2): 
    pass 
     

Before Python 2.3, that would be a simple depth-first search over a class hierarchy tree. In other words, when MyClass().method() is called, the interpreter looks for the method in MyClass, then Base1, and then eventually finds it in Base2.

Figure 4.1: Classical hierarchy

When we introduce a CommonBase class at the top of our class hierarchy (refer to Figure 4.2), things will get more complicated:

class CommonBase:
    pass
class Base1(CommonBase): 
    pass 
     
 
class Base2(CommonBase):
    def method(self):
        print("Base2.method() called")
class MyClass(Base1, Base2): 
    pass 

As a result, the simple resolution order that behaves according to the left-to-right depth-first rule is getting back to the top through the Base1 class before looking into the Base2 class. This algorithm results in a counterintuitive output. Without the C3 linearization, the method that is executed would not be the one that is the closest in the inheritance tree.

Figure 4.2: The diamond class hierarchy

Such an inheritance scenario (known as the diamond class hierarchy) is rather uncommon for custom-built classes. The standard library typically does not structure the inheritance hierarchies in this way, and many developers think that it is bad practice. It is possible with Python anyway and thus requires a well-defined and clear handling strategy.

Also, starting from Python 2.3, object is at the top of the type hierarchy for classes. Essentially, every class is a part of a large diamond class inheritance hierarchy. It became something that has to be resolved on the C side of the language as well. That's why Python now has C3 linearization as the MRO algorithm.

In Python 2, classes inheriting from the object type were called new-style classes. Classes did not inherit implicitly from objects. In Python 3, every class is a new-style class and old-style classes are not available.

The original reference document of the Python MRO written by Michele Simionato describes linearization using the following words:

The linearization of C is the sum of C plus the merge of the linearizations of the parents and the list of the parents.

The Michele Simionato reference document explaining Python's MRO in great detail can be found at http://www.python.org/download/releases/2.3/mro.

The above simply means that C3 is a recursive algorithm. The C3 symbolic notation applied to our earlier inheritance example is as follows:

L[MyClass(Base1, Base2)] = 
        [MyClass] + merge(L[Base1], L[Base2], [Base1, Base2]) 

Here, L[MyClass] is the linearization of MyClass, and merge is a specific algorithm that merges several linearization results.

The merge algorithm is responsible for removing the duplicates and preserving the correct ordering. It uses the concept of list head and tail. The head is the first element of the list and the tail is the rest of the list following the head. Simionato describes the merge algorithm like this (adapted to our example):

Take the head of the first list, that is, L[Base1][0]; if this head is not in the tail of any of the other lists, then add it to the linearization of MyClass and remove it from the lists in the merge, otherwise look at the head of the next list and take it, if it is a good head.
Then, repeat the operation until all the classes are removed or it is impossible to find good heads. In this case, it is impossible to construct the merge; Python 2.3 will refuse to create the MyClass class and will raise an exception.

In other words, C3 does a recursive depth lookup on each parent to get a sequence of lists. Then, it computes a left-to-right rule to merge all lists with hierarchy disambiguation when a class is involved in several lists.

If we had to calculate the MRO for MyClass manually through a series of symbolic steps, we would first have to unfold all L[class] linearizations:

L[MyClass]
  = [MyClass] + merge(L[Base1], L[Base2], [Base1, Base2])]
  = [MyClass] + merge(
      [Base1 + merge(L[CommonBase], [CommonBase])],
      [Base2 + merge(L[CommonBase], [CommonBase])],
      [Base1, Base2]
    )
  = [MyClass] + merge(
      [Base1] + merge(L[CommonBase], [CommonBase]),
      [Base2] + merge(L[CommonBase], [CommonBase]),
      [Base1, Base2]
    )
  = [MyClass] + merge(
      [Base1] + merge([CommonBase] + merge(L[object]), [CommonBase]),
      [Base2] + merge([CommonBase] + merge(L[object]), [CommonBase]),
      [Base1, Base2]
    )

Essentially, the object class has no ancestors so its C3 linearization is just a single element list [object]. It means we continue by unfolding merge([object]) to [object]:

  = [MyClass] + merge(
    [Base1] + merge([CommonBase] + merge([object]), [CommonBase]),
    [Base2] + merge([CommonBase] + merge([object]), [CommonBase]),
    [Base1, Base2]
  )

merge([object]) has only a single element list so it immediately unfolds to [object]:

  = [MyClass] + merge(
    [Base1] + merge([CommonBase, object], [CommonBase]),
    [Base2] + merge([CommonBase, object], [CommonBase]),
    [Base1, Base2]
  )

Now it's time to unfold merge([CommonBase, object], [CommonBase]). The head of the first list is CommonBase. It is not in the tail of other lists. We can immediately remove it from the merge and add it to the outer linearization result:

  = [MyClass] + merge(
    [Base1, CommonBase] + merge([object]),
    [Base2, CommonBase] + merge([object]),
    [Base1, Base2]
  )

We are again left with merge([object]) and we can continue unfolding:

  = [MyClass] + merge(
    [Base1, CommonBase, object],
    [Base2, CommonBase, object],
    [Base1, Base2]
  )

Now we are left with the last merge, which is finally non-trivial. The first head is Base1. It is not found in the tails of other lists. We can remove it from the merge and add it to the outer linearization result:

  = [MyClass, Base1] + merge(
    [CommonBase, object],
    [Base2, CommonBase, object],
    [Base2]
  )

Now the first head is CommonBase. It is found in the tail of the second list [Base2, CommonBase, object]. It means we can't process it at the moment and have to move to the next head, which is Base2. It is not found in the tail of other lists. We can remove it from the merge and add it to the outer linearization result:

  = [MyClass, Base1, Base2] + merge(
    [CommonBase, object],
    [CommonBase, object],
    []
  )

Now, CommonBase is again the first head but this time it is no longer found in other list tails. We can remove it from the merge and add it to the outer linearization result:

  = [MyClass, Base1, Base2, CommonBase] + merge(
    [object],
    [object],
    []
  )

The last merge([object], [object], []) step is trivial. The final linearization result is the following:

 [MyClass, Base1, Base2, CommonBase, object]

You can easily inspect the results of C3 linearization by verifying the __mro__ attribute of any class. The following transcript presents the computed MRO of MyClass:

>>> MyClass.__mro__
(<class '__main__.MyClass'>, <class '__main__.Base1'>, <class '__main__.Base2'>, <class '__main__.CommonBase'>, <class 'object'>) 

The __mro__ attribute of a class (which is read-only) stores the result of the C3 linearization computation. Computation is done when the class definition is loaded. You can also call MyClass.mro() to compute and get the result.

Class instance initialization

An object in OOP is an entity that encapsulates data together with behavior. In Python, data is contained as object attributes, which are simply object variables. Behavior, on the other hand, is represented by methods. That is common to almost every OOP language, but the exact nomenclature is sometimes different. For instance, in C++ and Java, object data is said to be stored in fields. In Kotlin, object data is stored behind properties (although they are a bit more than simple object variables).

What makes Python different from statically typed OOP languages is its approach to object attribute declaration and initialization. In short, Python classes do not require you to define attributes in the class body. A variable comes into existence at the time it is initialized. That's why the canonical way to declare object attributes is through assigning their values during object initialization in the __init__() method:

class Point:
   def __init__(self, x, y):
      self.x = x
      self.y = y

That may be confusing for those coming to Python with prior knowledge of statically typed programming languages. In those languages, the declaration of object fields is usually static and lives outside of the object initialization function. That's why programmers with a C++ or Java background often tend to replicate this pattern by assigning some default values as class attributes in the main class body:

 class Point:
   x = 0
   y = 0
   def __init__(self, x, y):
      self.x = x
      self.y = y

The above code is a classic example of a foreign language idiom replicated in Python. Most of all, it is redundant: class attribute values will always be shadowed by object attributes upon initialization. But it is also a dangerous code smell: it can lead to problematic errors if one decides to assign as a class attribute a mutable type like list or dict.

A code smell is a characteristic of code that may be a sign of a deeper problem. A specific piece of code can be functionally correct and free from errors but can be a stub for future problems. Code smells are usually small architectural deficiencies or unsafe constructs that attract bugs.

The problem comes from the fact that class attributes (attributes assigned outside of the method body) are assigned to type objects and not type instances. When accessing an attribute with self.attribute, Python will first look up the name attribute value in the class instance namespace. If that lookup fails, it will perform a lookup in the class type namespace. When assigning values through self.attribute from within the class method, the behavior is completely different: new values are always assigned in the class instance namespace. This is especially troublesome with mutable types as this may cause an accidental leak of the object state between class instances.

Because using mutable types as class attributes instead of instance attributes is rather a bad practice, it is hard to come up with code examples that would be practical. But it doesn't mean we can't take a quick look at how it actually works. Consider the following class, which is supposed to aggregate values as a list and track the last aggregated value:

class Aggregator:
    all_aggregated = []
    last_aggregated = None
    def aggregate(self, value):
        self.last_aggregated = value
        self.all_aggregated.append(value)

To see where the problem lies, let's start an interactive session, create two distinct aggregators, and start aggregating elements:

>>> a1 = Aggregator()
>>> a2 = Aggregator()
>>> a1.aggregate("a1-1")
>>> a1.aggregate("a1-2")
>>> a2.aggregate("a2-1")

If we now take a look at the aggregation lists of both instances, we will see very disturbing output:

>> a1.all_aggregated
['a1-1', 'a1-2', 'a2-1']
>>> a2.all_aggregated
['a1-1', 'a1-2', 'a2-1']

Someone reading the code could think that all Aggregator instances are supposed to track the history of their own aggregations. But we see that instead, all Aggregator instances share the state of the all_aggregated attribute. On the other hand, when looking at the last aggregated values, we see correct values for both aggregators:

>>> a1.last_aggregated
'a1-2'
>>> a2.last_aggregated
'a2-1'

In situations like these, it is easy to solve the mystery by inspecting the unbound class attribute values:

>>> Aggregator.all_aggregated
['a1-1', 'a1-2', 'a2-1']
>>> Aggregator.last_aggregated
>>>

As we see from the above transcript, all Aggregator instances shared their state through the mutable Aggregator.all_aggregated attribute. Something like this could be the intended behavior but very often is just an example of a mistake that is sometimes hard to track down. Due to this fact, all attribute values that are supposed to be unique for every class instance should absolutely be initialized in the __init__() method only.

The fixed version of the Aggregator class would be as follows:

class Aggregator:
    def __init__(self):
        self.all_aggregated = []
        self.last_aggregated = None
    def aggregate(self, value):
        self.last_aggregated = value
        self.all_aggregated.append(value)

We simply moved the initialization of the all_aggregated and last_aggregated attributes to the __init__() method. Now let's repeat the same initialization and aggregation calls as in the previous session:

>>> a1 = Aggregator()
>>> a2 = Aggregator()
>>> a1.aggregate("a1-1")
>>> a1.aggregate("a1-2")
>>> a2.aggregate("a2-1")

If we now inspect the state of Aggregator instances, we will see that they track the history of their aggregations independently:

>>> a1.all_aggregated
['a1-1', 'a1-2']
>>> a2.all_aggregated
['a2-1']

If you really feel the urge to have some kind of declaration of all attributes at the top of the class definition, you can use type annotations as in the following example:

from typing import Any, List
class Aggregator:
    all_aggregated: List[Any]
    last_aggregated: Any
    def __init__(self):
        self.all_aggregated = []
        self.last_aggregated = None
    def aggregate(self, value: Any):
        self.last_aggregated = value
        self.all_aggregated.append(value)

Having class attribute annotations actually isn't a bad practice. They can be used by static type verifiers or IDEs to increase the quality of code and better communicate the intended usage of your class and possible type constraints. Such class attribute annotations are also used to simplify the initialization of data classes, which we will discuss in the Data classes section.

Attribute access patterns

Another thing that sets Python apart from other statically typed object-oriented languages is the lack of the notion of public, private, and protected class members. In other languages, these are often used to restrict or open access to object attributes for code outside of the class. The Python feature that is nearest to this concept is name mangling. Every time an attribute is prefixed by __ (two underscores) within a class body, it is renamed by the interpreter on the fly:

class MyClass: 
    def __init__(self):
        self.__secret_value = 1 

Note that the double underscore pattern is referred to as a "dunder". Refer to the Dunder methods (language protocols) section for more information.

Accessing the __secret_value attribute by its initial name outside of the class will raise an AttributeError exception:

>>> instance_of = MyClass()
>>> instance_of.__secret_value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute '__secret_value'
>>> instance_of._MyClass__secret_value
1

One could think that this is synonymous with private/protected fields and methods commonly found in other OOP languages. It indeed makes it harder to access such attributes outside of the class but doesn't make such access impossible. Private and protected fields and methods in many other OOP languages are a means of providing class encapsulation. They are used to restrict access to specific symbols from anyone outside of a specific class (private) or anyone outside the inheritance tree (protected). In Python, name mangling does not restrict attribute access in any way. It only makes it less convenient.

The purpose of name mangling is an implicit way to avoid naming collisions. For instance, it may happen that a specific identifier is a perfect fit for a new internal attribute in some subclass. If that name is already taken somewhere up in the inheritance tree, the name clash may result in unexpected behavior.

In such situations, the programmer may decide to use a different name or use name mangling to resolve the conflict. Name mangling can also help in avoiding name clashes in subclasses. Still, it is not recommended to use name mangling in base classes by default, just to avoid any collisions in advance.

It all boils down to the Python way of doing things. Statically typed languages with private/protected keywords enforce the attribute access restriction. It means that usually there is no way to access such private/protected attributes outside of the class. In Python, it is more common to clearly communicate what the intended use is of each attribute instead of restricting users from doing whatever they want. With or without name mangling, programmers will find a way to access the attribute anyway. So, what's the purpose of making this less convenient for them?

When an attribute is not public, the convention to use is an _ prefix. This does not involve any name mangling algorithm, but just usually documents the attribute as an internal element of the class that is not intended to be used outside of the class context. Many IDEs and style checkers are already aware of this convention and are able to highlight places where such internal members are accessed outside of their class.

Python also has other mechanisms to separate the public part of the class from its private code. Two such mechanisms are descriptors and properties.

Descriptors

A descriptor lets you customize what should be done when you refer to an attribute of an object. Descriptors are the basis of complex attribute access in Python. They are used internally to implement properties, methods, class methods, static methods, and super. They are objects that define how attributes of another class can be accessed. In other words, a class can delegate the management of an attribute to another class.

The descriptor classes are based on three special methods that form the descriptor protocol:

  • __set__(self, obj, value): This is called whenever the attribute is set. In the following examples, we will refer to this as a setter.
  • __get__(self, obj, owner=None): This is called whenever the attribute is read (referred to as a getter).
  • __delete__(self, obj): This is called when del is invoked on the attribute.

A descriptor that implements __get__() and __set__() is called a data descriptor. If it just implements __get__(), then it is called a non-data descriptor.

Methods of the descriptor protocol are, in fact, called by the object's special __getattribute__() method on every attribute lookup (do not confuse it with __getattr__(), which has a different purpose). Whenever such a lookup is performed, either by using a dotted notation in the form of instance.attribute or by using the getattr(instance, 'attribute') function call, the __getattribute__() method is implicitly invoked and it looks for an attribute in the following order:

  1. It verifies whether the attribute is a data descriptor on the class object of the instance
  2. If not, it looks to see whether the attribute can be found in the __dict__ lookup of the instance object
  3. Finally, it looks to see whether the attribute is a non-data descriptor on the class object of the instance

In other words, data descriptors take precedence over the __dict__ lookup, and the __dict__ lookup takes precedence over non-data descriptors.

To make it clearer, here is a modified example from the official Python documentation that shows how descriptors work on real code:

class RevealAccess(object): 
    """A data descriptor that sets and returns values 
       normally and prints a message logging their access. 
    """ 
 
    def __init__(self, initval=None, name='var'): 
        self.val = initval 
        self.name = name 
 
    def __get__(self, obj, objtype): 
        print('Retrieving', self.name) 
        return self.val 
 
    def __set__(self, obj, val): 
        print('Updating', self.name) 
        self.val = val 
 
    def __delete__(self, obj): 
        print('Deleting', self.name) 
 
class MyClass(object): 
    x = RevealAccess(10, 'var "x"') 
    y = 5

The official guide on using descriptors, together with many examples, can be found at https://docs.python.org/3.9/howto/descriptor.html.

Note that x = RevealAccess() is defined as a class attribute instead of assigning it in the __init__() method. Descriptors, in order to work, need to be defined as class attributes. Also, they are closer to methods than normal variable attributes. Here is an example of using the RevealAccess descriptor in the interactive session:

>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5
>>> del m.x
Deleting var "x"

The preceding example clearly shows that, if a class has the data descriptor for the given attribute, then the descriptor's __get__() method is called to return the value every time the instance attribute is retrieved, and __set__() is called whenever a value is assigned to such an attribute. The __del__ method of a descriptor is called whenever an instance attribute is deleted with the del instance.attribute statement or the delattr(instance, 'attribute') call.

The difference between data and non-data descriptors is important for the reasons highlighted at the beginning of the section. Python already uses the descriptor protocol to bind class functions to instances as methods.

Descriptors also power the mechanism behind the classmethod and staticmethod decorators. This is because, in fact, the function objects are non-data descriptors too:

>>> def function(): pass
>>> hasattr(function, '__get__')
True
>>> hasattr(function, '__set__')
False

This is also true for functions created with lambda expressions:

>>> hasattr(lambda: None, '__get__')
True
>>> hasattr(lambda: None, '__set__')
False

So, without __dict__ taking precedence over non-data descriptors, we would not be able to dynamically override specific methods on already constructed instances at runtime. Fortunately, thanks to how descriptors work in Python, it is possible; so, developers may use a popular technique called monkey patching to change the way in which instances work ad hoc without the need for subclassing.

Monkey patching is the technique of modifying the class instance dynamically at runtime by adding, modifying, or deleting attributes without touching the class definition or the source code.

Real-life example – lazily evaluated attributes

One example usage of descriptors may be to delay the initialization of the class attribute to the moment when it is accessed from the instance. This may be useful if the initialization of such attributes depends on some context that is not yet available at the time the class is imported. The other case is saving resources when such initialization is simply expensive in terms of computing resources but it is not known whether the attribute will be used anyway at the time the class is imported. Such a descriptor could be implemented as follows:

class InitOnAccess: 
    def __init__(self, init_func, *args, **kwargs): 
        self.klass = init_func 
        self.args = args 
        self.kwargs = kwargs 
        self._initialized = None 
 
    def __get__(self, instance, owner): 
        if self._initialized is None: 
            print('initialized!') 
            self._initialized = self.klass(*self.args,              **self.kwargs) 
        else: 
            print('cached!') 
        return self._initialized

The InitOnAccess descriptor class includes some print() calls that allow us to see whether values were initialized on access or accessed from the cache.

Let's imagine we want to have a class where all instances have access to a shared list of sorted random values. The length of the list could be arbitrarily long, so it makes sense to reuse it for all instances. On the other hand, sorting very long input can be time-consuming. That's why the InitOnAccess class will make sure that such a list will be initialized only on first access. Our class definition could be as follows:

import random
class WithSortedRandoms:
    lazily_initialized = InitOnAccess(
        sorted,
        [random.random() for _ in range(5)]
    )

Note that we used fairly small input to the range() function to make the output readable. Here is an example usage of the WithSortedRandoms class in an interactive session:

>>> m = WithSortedRandoms()
>>> m.lazily_initialized
initialized!
[0.2592159616928279, 0.32590583255950756, 0.4015520901807743, 0.4148447834912816, 0.4187058605495758, 0.4534290894962043, 0.4796775578337028, 0.6963642650184283, 0.8449725511007807, 0.8808174325885045] 
>>> m.lazily_initialized
cached!
[0.2592159616928279, 0.32590583255950756, 0.4015520901807743, 0.4148447834912816, 0.4187058605495758, 0.4534290894962043, 0.4796775578337028, 0.6963642650184283, 0.8449725511007807, 0.8808174325885045] 

The official OpenGL Python library available on PyPI under the PyOpenGL name uses a similar technique to implement a lazy_property object that is both a decorator and a data descriptor:

class lazy_property(object): 
    def __init__(self, function): 
        self.fget = function 
 
    def __get__(self, obj, cls): 
        value = self.fget(obj) 
        setattr(obj, self.fget.__name__, value) 
        return value 

The setattr() function allows you to set the attribute of the object instance by using the attribute from the provided positional argument. Here, it is self.fget.__name__. It is constructed like that because the lazy_property descriptor is supposed to be used as a decorator of the method acting as a provider of the initialized value as in the following example:

class lazy_property(object):
    def __init__(self, function):
        self.fget = function
    def __get__(self, obj, cls):
        value = self.fget(obj)
        setattr(obj, self.fget.__name__, value)
        return value
class WithSortedRandoms:
    @lazy_property
    def lazily_initialized(self):
        return sorted([[random.random() for _ in range(5)]])

Such an implementation is similar to using the property decorator described in the next section. The function that is wrapped with it is executed only once and then the instance attribute is replaced with a value returned by that function property. This instance attribute takes precedence over the descriptor (the class attribute) so no more initializations will be performed on the given class instance. This technique is often useful when there's a need to fulfill the following two requirements at the same time:

  • An object instance needs to be stored as a class attribute that is shared between its instances (to save resources)
  • This object cannot be initialized at the time of import because its creation process depends on some global application state/context

In the case of applications written using OpenGL, you can encounter this kind of situation very often. For example, the creation of shaders in OpenGL is expensive because it requires a compilation of code written in OpenGL Shading Language (GLSL). It is reasonable to create them only once, and, at the same time, include their definition in close proximity to classes that require them. On the other hand, shader compilations cannot be performed without OpenGL context initialization, so it is hard to define and compile them reliably in a global module namespace at the time of import.

The following example shows the possible usage of the modified version of PyOpenGL's lazy_property decorator (here, lazy_class_attribute) in some imaginary OpenGL-based application. The highlighted change to the original lazy_property decorator was required in order to allow the attribute to be shared between different class instances:

import OpenGL.GL as gl 
from OpenGL.GL import shaders 
 
 
class lazy_class_attribute(object): 
    def __init__(self, function): 
        self.fget = function 
 
    def __get__(self, obj, cls):
        value = self.fget(cls)
        # note: storing in class object not its instance
        #       no matter if its a class-level or
        #       instance-level access
        setattr(cls, self.fget.__name__, value)
        return value 
  
class ObjectUsingShaderProgram(object): 
    # trivial pass-through vertex shader implementation 
    VERTEX_CODE = """ 
        #version 330 core 
        layout(location = 0) in vec4 vertexPosition; 
        void main(){ 
            gl_Position =  vertexPosition; 
        } 
    """ 
    # trivial fragment shader that results in everything 
    # drawn with white color 
    FRAGMENT_CODE = """ 
        #version 330 core 
        out lowp vec4 out_color; 
        void main(){ 
            out_color = vec4(1, 1, 1, 1); 
        } 
    """ 
 
    @lazy_class_attribute 
    def shader_program(self): 
        print("compiling!") 
        return shaders.compileProgram( 
            shaders.compileShader( 
                self.VERTEX_CODE, gl.GL_VERTEX_SHADER 
            ), 
            shaders.compileShader( 
                self.FRAGMENT_CODE, gl.GL_FRAGMENT_SHADER 
            ) 
        ) 

Like every advanced Python syntax feature, this one should also be used with caution and documented well in code. Descriptors affect the very basic part of class behavior. For inexperienced developers, the altered class behavior might be very confusing and unexpected. Because of that, it is very important to make sure that all your team members are familiar with descriptors and understand this concept well if it plays an important role in your project's code base.

Properties

Anyone who has programmed in C++ or Java for a while should probably be familiar with the term encapsulation. It is a means of protecting direct access to class fields coming from the assumption that all internal data held by a class should be considered private. In a fully encapsulated class, as few methods as possible should be exposed as public. Any write or read access to an object's state should be exposed through setter and getter methods that are able to guard proper usage. In Java, for instance, this pattern can look as follows:

public class UserAccount {
  private String username;
  public String getUsername() {
    return username;
  }
  public void setUsername(String newUsername) {
    this.username = newUsername;
  }
}

The getUsername() method is a username getter and the setUsername() method is a username setter. The premise is quite good. By hiding access to class members behind getters and setters (also known as accessors and mutators), you are able to guard the right access to internal class values (let's say, perform validation on setters). You are also creating an extension point in the class public API that can be potentially enriched with additional behavior whenever there is such a need without breaking the backward compatibility of the class API.

Let's imagine that you have a class for a user account that, among others, stores the user's password. If you would like to emit audit logs whenever a password is accessed, you could either make sure that every place in your code that accesses user passwords has proper audit log calls or proxy all access to password entry through a set of setter and getter methods that have the logging call added by default.

The problem is that you can never be sure what will require an additional extension in the future. This simple fact often leads to over-encapsulation and a never-ending litany of setter and getter methods for every possible field that could otherwise be public. They are simply tedious to write, and way too often provide little to no benefit and just reduce the signal-to-noise ratio.

Thankfully, Python has a completely different approach to the accessor and mutator pattern through the mechanism of properties. Properties allow you to freely expose public members of classes and simply convert them to getter and setter methods whenever there is such a need. And you can do that completely without breaking the backward compatibility of your class API. Consider the example of an encapsulated UserAccount class that does not use the feature of properties:

class UserAccount:
    def __init__(self, username, password):
        self._username = username
        self._password = password
    def get_username(self):
        return self._username
    def set_username(self, username):
        self._username = username
    def get_password(self):
        return self._password
    def set_username(self, password):
        self._password = password

Whenever you see code like the above, which can be recognized by the abundance of get_ and set_ methods, you can be almost 100% sure that you're dealing with a foreign language idiom. That's something that a C++ or Java programmer could write. A seasoned Python programmer would rather write the following:

class UserAccount:
    def __init__(self, username, password):
        self.username = username
        self.password = password

And only when there's an actual need to hide a specific field behind a property, not sooner, an experienced programmer would provide the following modification:

class UserAccount:
    def __init__(self, username, password):
        self.username = username
        self._password = password
    
    @property
    def password(self):
        return self._password
    @password.setter
    def password(self, value):
        self._password = value

The properties provide a built-in descriptor type that knows how to link an attribute to a set of methods. The property() function takes four optional arguments: fget, fset, fdel, and doc. The last one can be provided to define a docstring function that is linked to the attribute as if it were a method. Here is an example of a Rectangle class that can be controlled either by direct access to attributes that store two corner points or by using the width and height properties:

class Rectangle: 
    def __init__(self, x1, y1, x2, y2): 
        self.x1, self.y1 = x1, y1 
        self.x2, self.y2 = x2, y2 
 
    def _width_get(self): 
        return self.x2 - self.x1 
 
    def _width_set(self, value): 
        self.x2 = self.x1 + value 
 
    def _height_get(self): 
        return self.y2 - self.y1 
 
    def _height_set(self, value): 
        self.y2 = self.y1 + value 
 
    width = property( 
        _width_get, _width_set, 
        doc="rectangle width measured from left" 
    ) 
    height = property( 
        _height_get, _height_set, 
        doc="rectangle height measured from top" 
    ) 
 
    def __repr__(self): 
        return "{}({}, {}, {}, {})".format( 
            self.__class__.__name__, 
            self.x1, self.y1, self.x2, self.y2 
        )

The following is an example of such defined properties in an interactive session:

>>> rectangle = Rectangle(10, 10, 25, 34)
>>> rectangle.width, rectangle.height
(15, 24)
>>> rectangle.width = 100
>>> rectangle
Rectangle(10, 10, 110, 34)
>>> rectangle.height = 100
>>> rectangle
Rectangle(10, 10, 110, 110)
>>> help(Rectangle)
Help on class Rectangle
    
class Rectangle(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self, x1, y1, x2, y2)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  --------------------------------------------------------
 |  Data descriptors defined here:
 |  (...)
 |  
 |  height
 |      rectangle height measured from top
 |  
 |  width
 |      rectangle width measured from left

The properties make it easier to write descriptors but must be handled carefully when using inheritance over classes. The attribute created is made on the fly using the methods of the current class and will not use methods that are overridden in the derived classes.

For instance, the following example will fail to override the implementation of the fget method of the parent class's width property:

>>> class MetricRectangle(Rectangle):
...     def _width_get(self):
...         return "{} meters".format(self.x2 - self.x1)
...         
>>> Rectangle(0, 0, 100, 100).width
100

In order to resolve this, the whole property simply needs to be overwritten in the derived class:

>>> class MetricRectangle(Rectangle):
...     def _width_get(self):
...         return "{} meters".format(self.x2 - self.x1)
...     width = property(_width_get, Rectangle.width.fset)
...     
>>> MetricRectangle(0, 0, 100, 100).width
'100 meters'  

Unfortunately, the preceding code has some maintainability issues. It can be a source of confusion if the developer decides to change the parent class but forgets to update the property call. This is why overriding only parts of the property behavior is not advised. Instead of relying on the parent class's implementation, it is recommended that you rewrite all the property methods in the derived classes if you need to change how they work. In most cases, this is the only option, because usually, the change to the property setter behavior implies a change to the behavior of getter as well.

Because of this, the best syntax for creating properties is to use property as a decorator. This will reduce the number of method signatures inside the class and make the code more readable and maintainable:

class Rectangle:
    def __init__(self, x1, y1, x2, y2):
        self.x1, self.y1 = x1, y1
        self.x2, self.y2 = x2, y2
    @property
    def width(self):
        """rectangle width measured from left"""
        return self.x2 - self.x1
    @width.setter
    def width(self, value):
        self.x2 = self.x1 + value
    @property
    def height(self):
        """rectangle height measured from top"""
        return self.y2 - self.y1
    @height.setter
    def height(self, value):
        self.y2 = self.y1 + value

The best thing about the Python property mechanism is that it can be introduced to a class gradually. You can start by exposing public attributes of the class instance and convert them to properties only if there is such a need. Other parts of your code won't notice any change in the class API because properties are accessed as if they were ordinary instance attributes.

We've so far discussed the object-oriented data model of Python in comparison to different programming languages. But the data model is only a part of the OOP landscape. The other important factor of every object-oriented language is the approach to polymorphism. Python provides a few implementations of polymorphism and that will be the topic of the next section.

Dynamic polymorphism

Polymorphism is a mechanism found commonly in OOP languages. Polymorphism abstracts the interface of an object from its type. Different programming languages achieve polymorphism through different means. For statically typed languages, it is usually achieved through:

  • Subtyping: Subtypes of type A can be used in every interface that expects type A. Interfaces are defined explicitly, and subtypes/subclasses inherit interfaces of their parents. This is a polymorphism mechanism found in C++.
  • Implicit interfaces: Every type can be used in the interface that expects an interface of type A as long as it implements the same methods (has the same interface) as type A. The declarations of interfaces are still defined explicitly but subclasses/subtypes don't have to explicitly inherit from the base classes/types that define such an interface. This is a polymorphism mechanism found in Go.

Python is a dynamically typed language, so uses a rather lax mechanism of polymorphism that is often referred to as duck typing. The duck typing principle says the following:

If it walks like a duck and it quacks like a duck, then it must be a duck.

Application of that principle in Python means that any object can be used within a given context as long as the object works and behaves as the context expects. This typing philosophy is very close to implicit interfaces known in Go, although it does not require any declaration of the expected interfaces of function arguments. Because Python does not enforce types or interfaces of function arguments, it does not matter what types of objects are provided to the function. What matters instead is which methods of those objects are actually used within the function body.

To better understand the concept, consider the following example of a function that is supposed to read a file, print its contents, and close the file afterward:

def printfile(file):
    try:
        contents = file.read()
        print(file)
    finally:
        file.close()

From the signature of the printfile() function, we can already guess that it expects a file or a file-like object (like StringIO from the io module). But the truth is this function will consume any object without raising an unexpected exception if we are able to ensure for the input argument that:

  • The file argument has a read() method
  • The result of file.read() is a valid argument to the print() function
  • The file argument has the close() method

The above three points also indicate the three places where polymorphism happens in the above example. Depending on the actual type of the file argument, the printfile() function will use different implementations of the read() and close() methods. The type of the contents variable can also be different depending on the file.read() implementation, in which case the print() function will use different implementation of object string representation.

This approach to polymorphism and typing is really powerful and flexible, although it has some downsides. Due to the lack of type and interface enforcement, it is harder to verify the code's correctness before execution. That's why high-quality applications must rely on extensive code testing with rigorous coverage of every path that code execution can take. Python allows you to partially overcome this problem through type hinting annotations that can be verified with additional tools before runtime.

The dynamic type system of Python together with the duck-typing principle creates an implicit and omnipresent form of dynamic polymorphism that makes Python very similar to JavaScript, which also lacks static type enforcement. But there are other forms of polymorphism available to Python developers that are more "classical" and explicit in nature. One of those forms is operator overloading.

Operator overloading

Operator overloading is a specific type of polymorphism that allows the language to have different implementations of specific operators depending on the types of operands.

Operators in many programming languages are already polymorphic. Consider the following expressions that would be valid constructs in Python:

7 * 6
3.14 * 2
["a", "b"] * 3
"abba" * 2

Those expressions in Python would have four different implementations:

  • 7 * 6 is integer multiplication resulting in an integer value of 42
  • 3.14 * 2 is float multiplication resulting in a float value of 6.28
  • ["a", "b"] * 3 is list multiplication resulting in a list value of ['a', 'b', 'a', 'b', 'a', 'b']
  • "abba" * 2 is string multiplication resulting in a string value of 'abbaabba'

The semantics and implementation of all Python operators are already different depending on the types of operands. Python provides multiple built-in types together with various implementations of their operators, but it doesn't mean that every operator can be used with any type.

For instance, the + operator is used for the summation or concatenation of operands. It makes sense to concatenate numeric types like integer or floating-point numbers, as well as to concatenate strings and lists. But this operator can't be used with sets or dictionaries as such an operation would not make mathematical sense (sets could be either intersected or joined) and the expected result would be ambiguous (which values of two dictionaries should be used in the event of conflict?).

Operator overloading is just the extension of the built-in polymorphism of operators already included in the programming language. Many programming languages, including Python, allow you to define a new implementation for operand types that didn't have a valid operator implementation or shadow existing implementation through subclassing.

Dunder methods (language protocols)

The Python data model specifies a lot of specially named methods that can be overridden in your custom classes to provide them with additional syntax capabilities. You can recognize these methods by their specific naming conventions that wrap the method name with double underscores. Because of this, they are sometimes referred to as dunder methods. It is simply shorthand for double underscores.

The most common and obvious example of such dunder methods is __init__(), which is used for class instance initialization:

class CustomUserClass:
    def __init__(self, initiatization_argument):
         ...

These methods, either alone or when defined in a specific combination, constitute the so-called language protocols. If we say that an object implements a specific language protocol, it means that it is compatible with a specific part of the Python language syntax. The following is a table of the most common protocols within the Python language.

Protocol name

Methods

Description

Callable protocol

__call__()

Allows objects to be called with parentheses:

instance()

Descriptor protocols

__set__(), __get__(), and __del__()

Allows us to manipulate the attribute access pattern of classes (see the Descriptors section)

Container protocol

__contains__()

Allows us to test whether or not an object contains some value using the in keyword:

value in instance

Iterable protocol

__iter__()

Allows objects to be iterated using the for keyword:

for value in instance:
    ...

Sequence protocol

__getitem__(),

__len__()

Allows objects to be indexed with square bracket syntax and queried for length using a built-in function:

item = instance[index]

length = len(instance)

Each operator available in Python has its own protocol and operator overloading happens by implementing the dunder methods of that protocol. Python provides over 50 overloadable operators that can be divided into five main groups:

  • Arithmetic operators
  • In-place assignment operators
  • Comparison operators
  • Identity operators
  • Bitwise operators

That's a lot of protocols so we won't discuss all of them here. We will instead take a look at a practical example that will allow you to better understand how to implement operator overloading on your own.

A full list of available dunder methods can be found in the Data model section of the official Python documentation available at https://docs.python.org/3/reference/datamodel.html.

All operators are also exposed as ordinary functions in the operators module. The documentation of that module gives a good overview of Python operators. It can be found at https://docs.python.org/3.9/library/operator.html.

Let's assume that we are dealing with a mathematical problem that can be solved through matrix equations. A matrix is a mathematical element of linear algebra with well-defined operations. In the simplest form, it is a two-dimensional array of numbers. Python lacks native support for multi-dimensional arrays other than nesting lists within lists. Because of that, it would be a good idea to provide a custom class that encapsulates matrices and operations between them. Let's start by initializing our class:

class Matrix:
    def __init__(self, rows):
        if len(set(len(row) for row in rows)) > 1:
            raise ValueError("All matrix rows must be the same length")
        self.rows = rows

The first dunder method of the Matrix class is __init__(), which allows us to safely initialize the matrix. It accepts a variable list of matrix rows as input arguments through argument unpacking. As every row needs to have the same number of columns, we iterate over them and verify that they all have the same length.

Now let's add the first operator overloading:

def __add__(self, other):
    if (
        len(self.rows) != len(other.rows) or
        len(self.rows[0]) != len(other.rows[0])
    ):
        raise ValueError("Matrix dimensions don't match")
    return Matrix([
        [a + b for a, b in zip(a_row, b_row)]
        for a_row, b_row in zip(self.rows, other.rows)
    ])

The__add__() method is responsible for overloading the + (plus sign) operator and here it allows us to add two matrices together. Only matrices of the same dimensions can be added together. This is a fairly simple operation that involves adding all matrix elements one by one to form a new matrix.

The __sub__() method is responsible for overloading the (minus sign) operator that will be responsible for matrix subtraction. To subtract two matrices, we use a similar technique as in the operator:

def __sub__(self, other):
    if (
        len(self.rows) != len(other.rows) or
        len(self.rows[0]) != len(other.rows[0])
    ):
        raise ValueError("Matrix dimensions don't match")
    return Matrix([
        [a - b for a, b in zip(a_row, b_row)]
        for a_row, b_row in zip(self.rows, other.rows)
    ])

And the following is the last method we add to our class:

def __mul__(self, other):
    if not isinstance(other, Matrix):
        raise TypeError(
            f"Don't know how to multiply {type(other)} with Matrix"
        )
    if len(self.rows[0]) != len(other.rows):
        raise ValueError(
            "Matrix dimensions don't match"
        )
    rows = [[0 for _ in other.rows[0]] for _ in self.rows]
    for i in range(len(self.rows)):
        for j in range(len(other.rows[0])):
            for k in range(len(other.rows)):
                rows[i][j] += self.rows[i][k] * other.rows[k][j]
    return Matrix(rows)

The last overloaded operator is the most complex one. This is the * operator, which is implemented through the __mul__() method. In linear algebra, matrices don't have the same multiplication operation as real numbers. Two matrices can be multiplied if the first matrix has a number of columns equal to the number of rows of the second matrix. The result of that operation is a new matrix where each element is a dot product of the corresponding row of the first matrix and the corresponding column of the second matrix.

Here we've built our own implementation of the matrix to present the idea of operators overloading. Although Python lacks a built-in type for matrices, you don't need to build them from scratch. The NumPy package is one of the best Python mathematical packages and among others provides native support for matrix algebra. You can easily obtain the NumPy package from PyPI.

Comparison to C++

One programming language where operator overloading is particularly common is C++. It is a statically typed OOP language that is nothing like Python. Python has OOP elements and some mechanisms that, in essence, are similar to those of C++. These are mainly the existence of classes and class inheritance together with the ability to overload operators. But the way these mechanisms are implemented within the language is completely different. And that's why comparing those two languages is so fascinating.

C++, in contrast to Python, has multiple coexisting polymorphism mechanisms. The main mechanism is through subtyping, which is also available in Python. The second major type of polymorphism in C++ is ad hoc polymorphism through function overloading. Python lacks a direct counterpart of that feature.

Function overloading in C++ allows you to have multiple implementations of the same function depending on input arguments. It means that you can have two functions or methods sharing the same name but having a different number of and/or types of arguments. As C++ is a statically typed language, types of arguments are always known in advance and the choice of exact implementation happens at compile time.

To make it even more flexible, function overloading can be used together with operator overloading. The use case for such overloading coexistence can be better understood if we bring back the matrix multiplication use case. We know that two matrices can be multiplied together and we've learned how to do that in the previous section. But linear algebra also allows you to multiply a matrix with a scalar type like a real number. This operation results in a new matrix where every element has been multiplied by the scalar. In code, that would mean essentially another implementation of the multiplication operator.

In C++, you can simply provide multiple coexisting * operator overloading functions. The following is an example of C++ function signatures for overloaded operators that could allow various matrix and scalar multiplication implementations:

Matrix operator+(const Matrix& lhs, const Matrix& rhs)
Matrix operator+(const Matrix& lhs, const int& rhs)
Matrix operator+(const Matrix& lhs, const float& rhs)
Matrix operator+(const int& lhs, const Matrix& rhs)
Matrix operator+(const float& lhs, const Matrix& rhs)

Python is a dynamically typed language, and that's the main reason why it doesn't have function overloading as in C++. If we want to implement * operator overloading on the Matrix class that supports both matrix multiplication and scalar multiplication, we need to verify the operator input type at runtime. This can be done with the built-in isinstance() function as in the following example:

def __mul__(self, other):
    if isinstance(other, Matrix):
        ...
    elif isinstance(other, Number):
        return Matrix([
            [item * other for item in row]
            for row in self.rows
        ])
    else:
        raise TypeError(f"Can't subtract {type(other)} from Matrix")

Another major difference is that C++ operator overloading is done through free functions instead of class methods, while in Python, the operator is always resolved from one operand's dunder method. This difference can again be displayed using an example of scalar implementation. The previous example allowed us to multiply a matrix by an integer number in the following form:

Matrix([[1, 1], [2, 2]]) * 3

This will work because the overloaded operator implementation will be resolved from the left operand. On the other hand, the following expression will result in TypeError:

3 * Matrix([1, 1], [2, 2]])

In C++, you can provide multiple versions of operator overloading that cover all combinations of operand types for the * operator. In Python, the workaround for that problem is providing the __rmul__() method. This method is resolved from the right-side operand if the left-side __mul__() operator raises TypeError. Most infix operators have their right-side implementation alternatives. The following is an example of the __rmul__() method for the Matrix class that allows you to perform scalar multiplication with a right-hand side number argument:

    def __rmul__(self, other):
        if isinstance(other, Number):
            return self * other

As you see, it still requires the use of type evaluation through the isinstance() function, so operator overloading should be used very cautiously, especially if overloaded operators receive completely new meaning that is not in line with their original purpose.

The need to provide alternative overloaded implementations of the operator depending on the single operand type is usually a sign that the operator has lost its clear meaning. For instance, matrix multiplication and scalar multiplication are mathematically two distinct operations. They have different properties. For instance, scalar multiplication is cumulative while matrix multiplication isn't. Providing an overloaded operator for a custom class that has multiple internal implementations can quickly lead to confusion, especially in code that deals with math problems.

We were deliberately silent about the fact that Python actually has a dedicated matrix multiplication operator despite the fact that it doesn't have the built-in matrix type. That was just to better showcase the danger and complexities of overusing operator overloading. The dedicated operator for matrix multiplication is @ and actually, the potential confusion between scalar and matrix multiplication was one of the main reasons this operator was introduced.

In many programming languages, operator overloading can be considered a special case of function and method overloading and these usually come in a pair. Surprisingly, Python has operator overloading but doesn't offer real function and method overloading. It offers different patterns to fill that gap. We will discuss them in the next section.

Function and method overloading

A common feature of many programming languages is function and method overloading. It is another type of polymorphism mechanism. Overloading allows you to have multiple implementations of a single function by using different call signatures. Either a language compiler or interpreter is able to select a matching implementation based on the set of function call arguments provided. Function overloading is usually resolved based on:

  • Function arity (number of parameters): Two function definitions can share a function name if their signatures expect a different number of parameters.
  • Types of parameters: Two function definitions can share a function name if their signatures expect different types of parameters.

As already stated in the Operator overloading section, Python lacks an overloading mechanism for functions and methods other than operator overloading. If you define multiple functions in a single module that share the same name, the latter definition will always shadow all previous definitions.

If there is a need to provide several function implementations that behave differently depending on the type or number of arguments provided, Python offers several alternatives:

  • Using methods and/or subclassing: Instead of relying on a function to distinguish the parameter type, you can bind it to a specific type by defining it as a method of that type.
  • Using argument and keyword argument unpacking: Python allows for some flexibility regarding function signatures to support a variable number of arguments via *args and **kwargs patterns (also known as variadic functions).
  • Using type checking: The isinstance() function allows us to test input arguments against specific types and base classes to decide how to handle them.

Of course, each of the above options has some limitations. Pushing function implementation directly to class definitions as methods will not make any sense if said method doesn't constitute unique object behavior. Argument and keyword argument unpacking can make function signatures vague and hard to maintain.

Very often the most reliable and readable substitute for function overloading in Python is simply type checking. We've already seen this technique in action when discussing operator overloading. Let's recall the __mul__() method that was able to distinguish between matrix and scalar multiplication:

def __mul__(self, other):
    if isinstance(other, Matrix):
        ...
    elif isinstance(other, Number):
        ...
    else:
        raise TypeError(f"Can't subtract {type(other)} from Matrix")

As you can see, something that in a statically typed language would have to be done through function overloading, in Python can be resolved with a simple isinstance() call. That can be understood as an upside rather than a downside of Python. Still, this technique is convenient only for a small number of call signatures. When the number of supported types grows, it is often better to use more modular patterns. Such patterns rely on single-dispatch functions.

Single-dispatch functions

In situations when an alternative to function overloading is required and the number of alternative function implementations is really large, using multiple if isinstance(...) clauses can quickly get out of hand. Good design practice dictates writing small, single-purpose functions. One large function that branches over several types to handle input arguments differently is rarely a good design.

The Python Standard Library provides a convenient alternative. The functools.singledispatch() decorator allows you to register multiple implementations of a function. Those implementations can take any number of arguments but implementations will be dispatched depending on the type of the first argument. Single dispatch starts with a definition of a function that will be used by default for any non-registered type. Let's assume that we need a function that can output various variables in human-readable format for the purpose of a larger report being displayed in the console output. By default, we could use the f-string to denote a raw value in string format:

from functools import singledispatch
@singledispatch
def report(value):
    return f"raw: {value}"

From there, we can start registering different implementations for various types using the report.register() decorator. That decorator is able to read function argument type annotations to register specific type handlers. Let's say we want datetime objects to be reported in ISO format:

from datetime import datetime
@report.register
def _(value: datetime):
    return f"dt: {value.isoformat()}"

Note that we used the _ token as the actual function name. That serves two purposes. First, it is a convention for names of objects that are not supposed to be used explicitly. Second, if we used the report name instead, we would shadow the original function, thus losing the ability to access it and register new types.

Let's define a couple more type handlers:

from numbers import Real
@report.register
def _(value: complex):
    return f"complex: {value.real}{value.imag:+}j"
@report.register
def _(value: Real):
    return f"real: {value:f}"

Note that typing annotations aren't necessary but we've used them as an element of good practice. If you don't want to use typing annotations, you can specify the registered type as the register() method argument as in the following example:

@report.register(complex)
def _(value):
    return f"complex: {value.real}{value.imag:+}j"
@report.register(real)
def _(value):
    return f"real: {value:f}"

If we tried to verify the behavior of our collection of single-dispatch implementations in an interactive session, we would get an output like the following:

>>> report(datetime.now())
'dt: 2020-12-12T00:22:31.690377'
>>> report(100-30j)
'complex: 100.0-30.0j'
>>> report(9001)
'real: 9001.000000'
>>> report("January")
'raw: January'
>>> for key, value in report.registry.items():
...     print(f"{key} -> {value}")
...
<class 'object'> -> <function report at 0x7fdfd6929a60>
<class 'datetime.datetime'> -> <function _ at 0x7fdfd69a5af0>
<class 'complex'> -> <function _ at 0x7fdfd6993d30>
<class 'float'> -> <function _ at 0x7fdfd6d7ab80>
<class 'int'> -> <function _ at 0x7fdfd6d7ab80>

As we see, the report() function is now an entry point to a collection of registered functions. Whenever it is called with an argument, it looks in the registry mapping stored in report.registry. There's always at least one key that maps the object type to the default implementation of the function.

Additionally, there is a variation of the single-dispatch mechanism dedicated to class methods. Methods always receive the current object instance as their first argument. That means the functools.singledispatch() decorator would not be effective as the first argument of methods is always the same type. The functools. singledispatchmethod() decorator keeps that calling convention in mind and allows you to register multiple type-specific implementations on methods as well. It works by resolving the first non-self, non-cls argument:

from functools import singledispatchmethod
class Example:
    @singledispatchmethod
    def method(self, argument):
        pass
    @method.register
    def _(self, argument: float):
        pass

Remember that while the single-dispatch mechanism is a form of polymorphism that resembles function overloading, it isn't exactly the same. You cannot use it to provide several implementations of a function on multiple argument types and the Python Standard Library currently lacks such a multiple-dispatch utility.

Data classes

As we learned from the Class instance initialization section, the canonical way to declare class instance attributes is through assigning them in the __init__() method as in the following example:

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

Let's assume we are building a program that does some geometric computation and Vector is a class that allows us to hold information about two-dimensional vectors. We will display the data of the vectors on the screen and perform common mathematical operations, such as addition, subtraction, and equality comparison. We already know that we can use special methods and operator overloading to achieve that goal in a convenient way. We can implement our Vector class as follows:

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __add__(self, other):
        """Add two vectors using + operator"""
        return Vector(
            self.x + other.x,
            self.y + other.y,
        )
    def __sub__(self, other):
        """Subtract two vectors using - operator"""
        return Vector(
            self.x - other.x,
            self.y - other.y,
        )
    def __repr__(self):
        """Return textual representation of vector"""
        return f"<Vector: x={self.x}, y={self.y}>"
    def __eq__(self, other):
        """Compare two vectors for equality"""
        return self.x == other.x and self.y == other.y

The following is the interactive session example that shows how it behaves when used with common operators:

>>> Vector(2, 3)
<Vector: x=2, y=3>
>>> Vector(5, 3) + Vector(1, 2)
<Vector: x=6, y=5>
>>> Vector(5, 3) - Vector(1, 2)
<Vector: x=4, y=1>
>>> Vector(1, 1) == Vector(2, 2)
False
>>> Vector(2, 2) == Vector(2, 2)
True

The preceding vector implementation is quite simple, but it involves a lot of code that could be avoided. Our Vector class is focused on data. Most of the behavior it provides is centered around creating new Vector instances through mathematic operations. It doesn't provide complex initialization nor custom attribute access patterns. Things like equality comparison, string representation, and attribute initialization will look very similar and repetitive for various classes focused on data.

If your program uses many similar simple classes focused on data that do not require complex initialization, you'll end up writing a lot of boilerplate code just for the __init__(), __repr__(), and __eq__() methods.

With the dataclasses module, we can make our Vector class code a lot shorter:

from dataclasses import dataclass
@dataclass
class Vector:
    x: int
    y: int
    def __add__(self, other):
        """Add two vectors using + operator"""
        return Vector(
            self.x + other.x,
            self.y + other.y,
        )
    def __sub__(self, other):
        """Subtract two vectors using - operator"""
        return Vector(
            self.x - other.x,
            self.y - other.y,
        )

The dataclass class decorator reads attribute annotations of the Vector class and automatically creates the __init__(), __repr__(), and __eq__() methods. The default equality comparison assumes that the two instances are equal if all their respective attributes are equal to each other.

But that's not all. Data classes offer many useful features. They can easily be made compatible with other Python protocols, too. Let's assume we want our Vector class instances to be immutable. Thanks to this, they could be used as dictionary keys and as content values in sets. You can do this by simply adding a frozen=True argument to the dataclass decorator, as in the following example:

from dataclasses import dataclass
@dataclass(frozen=True)
class FrozenVector:
    x: int
    y: int

Such a frozen Vector data class becomes completely immutable, so you won't be able to modify any of its attributes. You can still add and subtract two Vector instances as in our example; these operations simply create new Vector objects.

We've learned already about the dangers of assigning default values to class attributes in the main class body instead of the __init__() function. The dataclass module offers a useful alternative through the field() constructor. This constructor allows you to specify both mutable and immutable default values for data class attributes in a sane and secure way without risking leaking the state between class instances. Static and immutable default values are provided using the field(default=value) call. The mutable values should always be passed by providing a type constructor using the field(default_factory=constructor) call. The following is an example of a data class with two attributes that have their default values assigned through the field() constructor:

from dataclasses import dataclass, field
@dataclass
class DataClassWithDefaults:
    immutable: str = field(default="this is static default value")
    mutable: list = field(default_factory=list)

Once a data class attribute has its default assigned, the corresponding initialization argument for that field becomes optional. The following transcript presents various ways of initializing DataClassWithDefaults class instances:

>>> DataClassWithDefaults()
DataClassWithDefaults(immutable='this is static default value', mutable=[])
>>> DataClassWithDefaults("This is immutable")
DataClassWithDefaults(immutable='This is immutable', mutable=[])
>>> DataClassWithDefaults(None, ["this", "is", "list"])
DataClassWithDefaults(immutable=None, mutable=['this', 'is', 'list'])

Data classes are similar in nature to structs in C or Go. Their main purpose is to hold data and provide shortcuts for the otherwise tedious initialization of instance attributes. But they should not be used as a basis for every possible custom class. If your class isn't meant to represent the data, and/or requires custom or complex state initialization, you should rather use the default way of initialization: through the __init__() method.

Python is not only about OOP. It supports other programming paradigms as well. One of those paradigms is functional programming, which concentrates on the evaluation of functions. Pure functional programming languages are usually drastically different than their OOP counterparts. But multiparadigm programming languages try to take the best of many programming styles. That's also true for Python. In the next section, we will review a few elements of Python that support functional programming. You will soon notice that this paradigm in Python is actually built over the foundation laid by OOP.

Functional programming

One of the great things about programming in Python is that you are never constrained to a single way of thinking about your programs. There are always various ways to solve a given problem, and sometimes the best one requires an approach that is slightly different from the one that would be the most obvious. Sometimes, this approach requires the use of declarative programming. Fortunately, Python, with its rich syntax and large standard library, offers features of functional programming, and functional programming is one of the main paradigms of declarative programming.

Functional programming is a paradigm where the program flow is achieved mainly through the evaluation of (mathematical) functions rather than through a series of steps that change the state of the program. Purely functional programs avoid the changing of state (side effects) and the use of mutable data structures.

One of the best ways to better understand the general concept of functional programming is by familiarizing yourself with the basic terms of functional programming:

  • Side effects: A function is said to have a side effect if it modifies the state outside of its local environment. In other words, a side effect is any observable change outside of the function scope that happens as a result of a function call. An example of such side effects could be the modification of a global variable, the modification of an attribute of an object that is available outside of the function scope, or saving data to some external service. Side effects are the core of the concept of OOP, where class instances are objects that are used to encapsulate the state of an application, and methods are functions bound to those objects that are supposed to manipulate the state of these objects. Procedural programming also heavily relies on side effects.
  • Referential transparency: When a function or expression is referentially transparent, it can be replaced with the value that corresponds to its output without changing the behavior of the program. So, a lack of side effects is a requirement for referential transparency, but not every function that lacks side effects is a referentially transparent function. For instance, Python's built-in pow(x, y) function is referentially transparent, because it lacks side effects, and for every x and y argument, it can be replaced with the value of xy. On the other hand, the datetime.now() constructor method of the datetime type does not seem to have any observable side effects but will return a different value every time it is called. So, it is referentially opaque.
  • Pure functions: A pure function is a function that does not have any side effects and that always returns the same value for the same set of input arguments. In other words, it is a function that is referentially transparent. Every mathematical function is, by definition, a pure function. Analogously, a function that leaves a trace of its execution for the outside world (for instance, by modifying received objects) is not a pure function.
  • First-class functions: Language is said to contain first-class functions if functions in this language can be treated as any other value or entity. First-class functions can be passed as arguments to other functions, returned as function return values, and assigned to variables. In other words, a language that has first-class functions is a language that treats functions as first-class citizens. Functions in Python are first-class functions.

Using these concepts, we could describe a purely functional language as a language that:

  • Has first-class functions
  • Is concerned only with pure functions
  • Avoids any state modification and side effects

Python, of course, is not a purely functional programming language, and it would be really hard to imagine a useful Python program that uses only pure functions without any side-effects. On the other hand, Python offers a large variety of features that, for years, were only accessible in purely functional languages, like:

  • Lambda functions and first-class functions
  • map(), filter(), and reduce() functions
  • Partial objects and functions
  • Generators and generator expressions

Those features make it possible to write substantial amounts of Python code in a functional way, even though Python isn't purely functional.

Lambda functions

Lambda functions are a very popular programming concept that is especially profound in functional programming. In other programming languages, lambda functions are sometimes known as anonymous functions, lambda expressions, or function literals. Lambda functions are anonymous functions that don't have to be bound to any identifier (variable).

At some point in Python 3's development, there was a heated discussion about removing the feature of lambda functions together with the map(), filter(), and reduce() functions. You can learn more about Guido van Rossum's article about reasons for removing those features at https://www.artima.com/weblogs/viewpost.jsp?thread=98196.

Lambda functions in Python can be defined only using expressions. The syntax for lambda functions is as follows:

lambda <arguments>: <expression>

The best way to present the syntax of lambda functions is by comparing a "normal" function definition with its anonymous counterpart. The following is a simple function that returns the area of a circle of a given radius:

import math
def circle_area(radius):
    return math.pi * radius ** 2

The same function expressed as a lambda function would take the following form:

lambda radius: math.pi * radius ** 2

Lambda functions are anonymous, but it doesn't mean they cannot be referred to using an identifier. Functions in Python are first-class objects, so whenever you use a function name, you're actually using a variable that is a reference to the function object. As with any other function, lambda functions are first-class citizens, so they can also be assigned to a new variable. Once assigned to a variable, they are seemingly indistinguishable from other functions, except for some metadata attributes. The following transcripts from interactive interpreter sessions illustrate this:

>>> import math
>>> def circle_area(radius):
...     return math.pi * radius ** 2
...
>>> circle_area(42)
5541.769440932395
>>> circle_area
<function circle_area at 0x10ea39048>
>>> circle_area.__class__
<class 'function'>
>>> circle_area.__name__
'circle_area'
>>> circle_area = lambda radius: math.pi * radius ** 2
>>> circle_area(42)
5541.769440932395
>>> circle_area
<function <lambda> at 0x10ea39488>
>>> circle_area.__class__
<class 'function'>
>>> circle_area.__name__
'<lambda>'

The main use for lambda expressions is to define contextual one-off functions that won't have to be reused elsewhere. To better understand their potential, let's imagine that we have an application that stores information about people. To represent a record of a person's data, we could use the following data class:

from dataclasses import dataclass
@dataclass
class Person:
    age: int
    weight: int
    name: str

Now let's imagine that we have a set of such records and we want to sort them by different fields. Python provides a sorted() function that is able to sort any list as long as elements can be compared with at least "less than" comparison (the < operator). We could define custom operator overloading on the Person class, but we would have to know in advance what field our records will be sorted on.

Thankfully, the sorted() function accepts the key keyword argument, which allows you to specify a function that will transform every element of the input into a value that can be naturally sorted by the function. Lambda expressions allow you to define such sorting keys on demand. For instance, sorting people by age can be done using the following call:

sorted(people, key=lambda person: person.age)

The above behavior of the sorted() function presents a common pattern of allowing code to accept a callable argument that resolves some injected behavior. Lambda expressions are often a convenient way of defining such behaviors.

The map(), filter(), and reduce() functions

The map(), filter(), and reduce() functions are three built-in functions that are most often used in conjunction with lambda functions. They are commonly used in functional-style Python programming because they allow us to declare transformations of any complexity, while simultaneously avoiding side effects.

In Python 2, all three functions were available as default built-in functions that did not require additional imports. In Python 3, the reduce() function was moved to the functools module, so it requires an additional import.

map(func, iterable, ...) applies the func function argument to every item of iterable. You can pass more iterables to the map() function. If you do so, map() will consume elements from each iterable simultaneously. The func function will receive as many arguments as there are iterables on every map step. If iterables are of different sizes, map() will stop when the shortest one is exhausted. It is worth remembering that map() does not evaluate the whole result at once, but returns an iterator so that every result item can be evaluated only when it is necessary.

The following is an example of map() being used to calculate the squares of the first 10 integers staring from 0:

>>> map(lambda x: x**2, range(10))
<map object at 0x10ea09cf8>
>>> list(map(lambda x: x**2, range(10)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The following is an example of the map() function being used over multiple iterables of different sizes:

>>> mapped = list(map(print, range(5), range(4), range(5)))
0 0 0
1 1 1
2 2 2
3 3 3
>>> mapped
[None, None, None, None]

filter(func, iterable) works similarly to map() by evaluating input elements one by one. Unlike map(), the filter() function does not transform input elements into new values, but allows us to filter out those input values that do not meet the predicate defined by the func argument. The following are examples of the filter() function's usage:

>>> evens = filter(lambda number: number % 2 == 0, range(10))
>>> odds = filter(lambda number: number % 2 == 1, range(10))
>>> print(f"Even numbers in range from 0 to 9 are: {list(evens)}")
Even numbers in range from 0 to 9 are: [0, 2, 4, 6, 8]
>>> print(f"Odd numbers in range from 0 to 9 are: {list(odds)}")
Odd numbers in range from 0 to 9 are: [1, 3, 5, 7, 9]
>>> animals = ["giraffe", "snake", "lion", "squirrel"]
>>> animals_s = filter(lambda animal: animal.startswith('s'), animals)
>>> print(f"Animals that start with letter 's' are: {list(animals_s)}")
Animals that start with letter 's' are: ['snake', 'squirrel']]

The reduce(func, iterable) function works in completely the opposite way to map(). As the name suggests, this function can be used to reduce an iterable to a single value. Instead of taking items of iterable and mapping them to the func return values in one-by-one fashion, it cumulatively performs the operation specified by func over all iterable items. So, for the following inputs of reduce():

reduce(func, [a, b, c, d])

The return value would be equal to:

func(func(func(a, b), c), d)

Let's consider the following example of reduce() calls being used to sum values of elements contained in various iterable objects:

>>> from functools import reduce
>>> reduce(lambda a, b: a + b, [2, 2]) 
4
>>> reduce(lambda a, b: a + b, [2, 2, 2]) 
6
>>> reduce(lambda a, b: a + b, range(100)) 
4950

One interesting aspect of map() and filter() is that they can work on infinite sequences. Of course, evaluating an infinite sequence to a list type or trying to ordinarily loop over such a sequence will result in a program that never ends. The count() function from itertools is an example of a function that returns infinite iterables. It simply counts from 0 to infinity. If you try to loop over it as in the following example, your program will never stop:

from itertools import count
for i in count():
    print(i)

However, the return values of map() and filter() are iterators. Instead of using a for loop, you can consume consecutive elements of the iterator using the next() function. Let's take a look again at our previous map() call that generated consecutive integer squares starting from 0:

map(lambda x: x**2, range(n))

The range() function returns a bounded iterable of n items. If we don't know how many items we want to generate, we can simply replace it with count():

map(lambda x: x**2, count())

From now on we can start consuming consecutive squares. We can't use a for loop because that would never end. But we can use next() numerous times and consume items one at a time:

sequence = map(lambda x: x**2, count())
next(sequence)
next(sequence)
next(sequence)
...

Unlike the map() and filter() functions, the reduce() function needs to evaluate all input items in order to return its value, as it does not yield intermediary results. This means that it cannot be used on infinite sequences.

Partial objects and partial functions

Partial objects are loosely related to the concept of partial functions in mathematics. A partial function is a generalization of a mathematical function in a way that isn't forced to map every possible input value range (domain) to its results. In Python, partial objects can be used to slice the possible input range of a given function by setting some of its arguments to a fixed value.

In the previous sections, we used the x ** 2 expression to get the square value of x. Python provides a built-in function called pow(x, y) that can calculate any power of any number. So, our lambda x: x ** 2 expression is a partial function of the pow(x, y) function, because we have limited the domain values for y to a single value, 2. The partial() function from the functools module provides an alternative way to easily define such partial functions without the need for lambda expressions, which can sometimes become unwieldy.

Let's say that we now want to create a slightly different partial function out of pow(). Last time, we generated squares of consecutive numbers. Now, let's narrow the domain of other input arguments and say we want to generate consecutive powers of the number two—so, 1, 2, 4, 8, 16, and so on.

The signature of a partial object constructor is partial(func, *args, **keywords). The partial object will behave exactly like func, but its input arguments will be pre-populated with *args (starting from the leftmost) and **keywords. The pow(x, y) function does not support keyword arguments, so we have to pre-populate the leftmost x argument as follows:

>>> from functools import partial
>>> powers_of_2 = partial(pow, 2)
>>> powers_of_2(2)
4
>>> powers_of_2(5)
32
>>> powers_of_2(10)
1024

Note that you don't need to assign your partial object to any identifier if you don't want to reuse it. You can successfully use it to define one-off functions in the same way that you would use lambda expressions.

The itertools module is a treasury of helpers and utilities for iterating over any type of iterable objects in various ways. It provides various functions that, among other things, allow us to cycle containers, group their contents, split iterables into chunks, and chain multiple iterables into one. Every function in that module returns iterators. If you are interested in functional-style programming in Python, you should definitely familiarize yourself with this module. You can find the documentation of the itertools module at https://docs.python.org/3/library/itertools.html.

Generators

Generators provide an elegant way to write simple and efficient code for functions that return a sequence of elements. Based on the yield statement, they allow you to pause a function and return an intermediate result. The function saves its execution context and can be resumed later, if necessary.

For instance, the function that returns consecutive numbers of the Fibonacci sequence can be written using a generator syntax. The following code is an example that was taken from the PEP 255 (Simple Generators) document:

def fibonacci():
    a, b = 0, 1
    while True:
        yield b
        a, b = b, a + b

You can retrieve new values from generators as if they were iterators, so using the next() function or for loops:

>>> fib = fibonacci()
>>> next(fib)
1
>>> next(fib)
1
>>> next(fib)
2
>>> for item in fibonacci():
...    print(item)
...    if item > 10:
...        break
...
1
1
2
3
5
8
13

Our fibonacci() function returns a generator object, a special iterator that knows how to save the execution context. It can be called indefinitely, yielding the next element of the sequence each time. The syntax is concise, and the infinite nature of the algorithm does not disturb the readability of the code. It does not have to provide a way to make the function stoppable. In fact, it looks similar to how the sequence generating function would be designed in pseudo code.

In many cases, the resources required to process one element are less than the resources required to store whole sequences. Therefore, they can be kept low, making the program more efficient. For instance, the Fibonacci sequence is infinite, and yet the generator that generates it does not require an infinite amount of memory to provide the values one by one and, theoretically, could work ad infinitum. A common use case is to stream data buffers with generators (for example, from files). They can be paused, resumed, and stopped whenever necessary at any stage of the data processing pipeline without any need to load whole datasets into the program's memory.

In functional programming, generators can be used to provide a stateful function that otherwise would require saving intermediary results as side effects as if it were a stateless function.

Generator expressions

Generator expressions are another syntax element that allows you to write code in a more functional way. Its syntax is similar to comprehensions that are used with dictionary, set, and list literals. A generator expression is denoted by parentheses, like in the following example:

(item for item in iterable_expression)

Generator expressions can be used as input arguments in any function that accepts iterables. They also allow if clauses to filter specific elements the same way as list, dictionary, and set comprehensions. This means that you can often replace complex map() and filter() constructions with more readable and compact generator expressions.

Syntactically, generator expressions are no different from any other comprehension expressions. Their main advantage is that they evaluate only one item at a time. So, if you process an arbitrarily long iterable expression, a generator expression may be a good fit as it doesn't need to fit the whole collection of intermediary results into program memory.

Lambdas, map, reduce, filter, partial functions, and generators are focused on presenting program logic as an evaluation of function call expressions. Another important element of functional programming is having first-class functions. In Python, all functions are objects and like any other object, they can be inspected and modified at runtime. It allows for a useful syntax feature called function decorators.

Decorators

The decorator is generally a callable expression that accepts a single argument when called (it will be the decorated function) and returns another callable object.

Prior to Python 3.9, only named expressions could be used with a dedicated decorator syntax. Starting from Python 3.9, any expression is a valid target for a dedicated decorator syntax, including lambda expressions.

While decorators are often discussed in the scope of methods and functions, they are not limited to them. In fact, anything that is callable (any object that implements the __call__ method is considered callable) can be used as a decorator, and often, objects returned by them are not simple functions but are instances of more complex classes that are implementing their own __call__ method.

The decorator syntax is simply syntactic sugar. Consider the following decorator usage:

@some_decorator 
def decorated_function(): 
    pass

This can always be replaced by an explicit decorator call and function reassignment:

def decorated_function(): 
    pass 
decorated_function = some_decorator(decorated_function) 

However, the latter is less readable and also very hard to understand if multiple decorators are used on a single function.

A decorator does not even need to return a callable!

As a matter of fact, any function can be used as a decorator, because Python does not enforce the return type of decorators. So, using some function as a decorator that accepts a single argument but does not return a callable object, let's say str, is completely valid in terms of syntax. This will eventually fail if you try to call an object that's been decorated this way. This part of the decorator syntax creates a field for some interesting experimentation.

Decorators are elements of the programming language inspired by aspect-oriented programming and the decorator design pattern. The main use case is to conveniently enhance an existing function implementation with extra behavior coming from other aspects of your application.

Consider the following example, taken from the Flask framework documentation:

@app.route('/secret_page')
@login_required
def secret_page():
    pass

secret_page() is a view function that presumably is supposed to return a secret page. It is decorated with two decorators. app.route() assigns a URI route to the view function and login_required() enforces user authentication.

According to the single-responsibility principle, functions should be rather small and single-purpose. In our Flask application, the secret_page() view function would be responsible for preparing the HTTP response that can be later rendered in a web browser. It probably shouldn't deal with things like parsing HTTP requests, verifying user credentials, and so on.

As the names suggests, the secret_page() function returns something that is secret, and shouldn't be visible to anyone. Verifying user credentials isn't part of the view function's responsibility but it is part of the general idea of "a secret page." The @login_required decorator allows you to bring the aspect of user authentication close to the view function. It makes the application more concise and the intent of the programmer more readable.

Let's look further at the actual example of the @login_required decorator from the Flask framework documentation:

from functools import wraps
from flask import g, request, redirect, url_for
def login_required(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        if g.user is None:
            return redirect(url_for('login', next=request.url))
        return f(*args, **kwargs)
    return decorated_function

The @wraps decorator allows you to copy decorated function metadata like name and type annotations. It is a good practice to use the @wraps decorator in your own decorators as it eases debugging and gives access to original function type annotations.

As we can see, this decorator returns a new decorated_function() function that at first verifies if the global g object has a valid user assigned. That's a common way of testing whether the user has been authenticated in Flask. If the test succeeds, the decorated function calls the original function by returning f(*args, **kwargs). If the login test fails, the decorated function will redirect the browser to the login page.

As we can see, the login_required() decorator conveys a little bit more than simple check-or-fail behavior. That makes decorators a great mechanism of code reuse. The login requirement may be a common aspect of applications, but the implementation of that aspect can change over time. Decorators offer a convenient way to pack such aspects into portable behaviors that can be easily added on top of existing functions.

We will use and explain decorators in more details in Chapter 8, Elements of Metaprogramming, where we will discuss decorators as a metaprogramming technique.

Enumerations

There are common programming features that are found in many programming languages regardless of the dominant programming paradigm. One such feature is enumerated types that have a finite number of named values. They are especially useful for encoding a closed set of values for variables or function arguments.

One of the special handy types found in the Python Standard Library is the Enum class from the enum module. This is a base class that allows you to define symbolic enumerations, similar in concept to the enumerated types found in many other programming languages (C, C++, C#, Java, and many more) that are often denoted with the enum keyword.

In order to define your own enumeration in Python, you will need to subclass the Enum class and define all enumeration members as class attributes. The following is an example of a simple Python enum:

from enum import Enum
class Weekday(Enum):
    MONDAY = 0
    TUESDAY = 1
    WEDNESDAY = 2
    THURSDAY = 3
    FRIDAY = 4
    SATURDAY = 5
    SUNDAY = 6

The Python documentation defines the following nomenclature for enum:

  • enumeration or enum: This is the subclass of the Enum base class. Here, it would be Weekday.
  • member: This is the attribute you define in the Enum subclass. Here, it would be Weekday.MONDAY, Weekday.TUESDAY, and so on.
  • name: This is the name of the Enum subclass attribute that defines the member. Here, it would be MONDAY for Weekday.MONDAY, TUESDAY for Weekday.TUESDAY, and so on.
  • value: This is the value assigned to the Enum subclass attribute that defines the member. Here, for Weekday.MONDAY it would be one, for Weekday.TUESDAY it would be two, and so on.

You can use any type as the enum member value. If the member value is not important in your code, you can even use the auto() type, which will be replaced with automatically generated values. Here is a similar example written with the use of auto:

from enum import Enum, auto
class Weekday(Enum):
    MONDAY = auto()
    TUESDAY = auto()
    WEDNESDAY = auto()
    THURSDAY = auto()
    FRIDAY = auto()
    SATURDAY = auto()
    SUNDAY = auto()

Enumerations in Python are really useful in every place where some variable can take only a finite number of values/choices. For instance, they can be used to define the status of objects, as shown in the following example:

from enum import Enum, auto
class OrderStatus(Enum):
    PENDING = auto()
    PROCESSING = auto()
    PROCESSED = auto()
class Order:
    def __init__(self):
        self.status = OrderStatus.PENDING
    def process(self):
        if self.status == OrderStatus.PROCESSED:
            raise ValueError(
                ""Can't process order that has ""
                ""been already processed""
            )
        self.status = OrderStatus.PROCESSING
        ...
        self.status = OrderStatus.PROCESSED

Another use case for enumerations is storing selections of non-exclusive choices. This is something that is often implemented using bit flags and bit masks in languages where the bit manipulation of numbers is very common, like C. In Python, this can be done in a more expressive and convenient way using the Flag base enumeration class:

from enum import Flag, auto
class Side(Flag):
    GUACAMOLE = auto()
    TORTILLA = auto()
    FRIES = auto()
    BEER = auto()
    POTATO_SALAD = auto()

You can combine such flags using bitwise operators (the | and & operators) and test for flag membership with the in keyword. Here are some examples of a Side enumeration:

>>> mexican_sides = Side.GUACAMOLE | Side.BEER | Side.TORTILLA
>>> bavarian_sides = Side.BEER | Side.POTATO_SALAD
>>> common_sides = mexican_sides & bavarian_sides
>>> Side.GUACAMOLE in mexican_sides
True
>>> Side.TORTILLA in bavarian_sides
False
>>> common_sides
<Side.BEER: 8>

Symbolic enumerations share some similarity with dictionaries and named tuples because they all map names/keys to values. The main difference is that the Enum definition is immutable and global. It should be used whenever there is a closed set of possible values that can't change dynamically during program runtime, and especially if that set should be defined only once and globally. Dictionaries and named tuples are data containers. You can create as many instances of them as you like.

Summary

In this chapter, we've looked at the Python language through the prism of different programming paradigms. Whenever it was sensible, we've tried to see how it compares to other programming languages that share similar features to see both strengths and weaknesses of Python.

We went pretty deep into the details of object-oriented programming concepts and extended our knowledge of supplementary paradigms like functional programming, so we are now fully prepared to start discussing topics on structuring and architecting whole applications.

The next chapter will cover that pretty extensively as it will be fully dedicated to various design patterns and methodologies.

    Reset